Speeding up Index Construction with Gpu for Dna Data Sequences
نویسنده
چکیده
The advancement of technology in scientific community has produced terabytes of biological data. This datum includes DNA sequences. String matching algorithm which is traditionally used to match DNA sequences now takes much longer time to execute because of the large size of DNA data and also the small number of alphabets. To overcome this problem, the indexing methods such as suffix arrays or suffix trees have been introduced. In this study we used suffix arrays as indexing algorithm because it is more applicable, not complex and used less space compared to suffix trees. The parallel method is then introduced to speed up the index construction process. Graphic processor unit (GPU) is used to parallelize a segment of an indexing algorithm. In this research, we used a GPU to parallelize the sorting part of suffix array construction algorithm. Our results show that the GPU is able to accelerate the process of building the index of the suffix array by 1.68 times faster than without GPU.
منابع مشابه
GPU-Accelerated BWT Construction for Large Collection of Short Reads
Advances in DNA sequencing technology have stimulated the development of algorithms and tools for processing very large collections of short strings (reads). Short-read alignment and assembly are among the most well-studied problems. Many state-of-the-art aligners, at their core, have used the Burrows-Wheeler transform (BWT) as a main-memory index of a reference genome (typical example, NCBI hu...
متن کاملEfficient Implementation of MrBayes on Multi-GPU
MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a ...
متن کاملNo Title Given
We introduce two novel techniques for speeding up the generation of digital (t, s)-sequences. Based on these results a new algorithm for the construction of Owen’s randomly permuted (t, s)−sequences is developed and analyzed. An implementation is available at http://www.mcqmc.org/Software.html.
متن کاملSelf-Supervised Clustering for Codebook Construction: An Application to Object Localization
Approaches to object localization based on codebooks do not exploit the dependencies between appearance and geometric information present in training data. This work addresses the problem of computing a codebook tailored to the task of localization by applying regularization based on geometric information. We present a novel method, the Regularized Combined Partitional-Agglomerative clustering,...
متن کاملSpeeding Up GPU Graph Processing Using Structural Graph Properties
Edge Vertex Push Edge Edge The Problem: We want the fastest graph processing! • High-performance graph processing is very interesting for data science • High-performance computing is increasingly GPU/accelerator based • Mapping irregular (graph) algorithms to GPU is hard • Performance of irregular algorithms is data-dependent Thesis Goals • Quantify performance impact of data dependence • Model...
متن کامل